System and method for estimating motion of target inside tissue based on surface deformation of soft tissue

ABSTRACT

Provided is a system and method for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue. The system consists of an acquisition unit, a reference input unit, two surface extraction units, a target position extraction unit, a feature calculation unit, and a target motion estimation unit. The method includes: the acquisition unit acquires an image I i  of the soft tissue; the surface extraction unit extracts a surface f i  of the soft tissue from I i ; the reference input unit acquires a reference image I ref  of the soft tissue; the surface extraction unit and the target position extraction unit respectively extract a reference surface f ref  of the soft tissue and a target reference position t ref  from I ref , the feature calculation unit calculates deformation feature Ψ i  of f i  relative to f ref , the target motion estimation unit estimates the target displacement based on Ψ i  and t ref .

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of International Application No. PCT/CN2021/080052, filed on Mar. 10, 2021, which claims priority to Chinese Application No.202011118867.9, filed on Oct. 19, 2020, the contents of both of which are incorporated herein by reference in their entireties.

TECHNICAL FIELD

The present application relates to the field of motion estimation and, in particularly, relates to a system and method for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue.

BACKGROUND

Target motion estimation is an important research direction in the field of medical image processing. In a medical image, a target in a soft tissue easily moves autonomously due to respiration or changes in position due to the influence of organ displacement and deformation. In clinical applications such as needle biopsy and real-time tracking radiotherapy, risks are brought to treatment accuracy; and in the imaging field, edge blur or artifacts of a target area may be caused. Therefore, the estimation on a motion signal of a target has a significant application value. In a clinical application environment, the uncertainty of the treatment can be reduced; and in the field of fast imaging, the method is conducive to removing the artifact or blurring phenomenon from the target area.

A traditional method for performing motion estimation on a target in a soft tissue is a technical solution based on registration, and the core thereof is to obtain an optimal deformation vector field to describe the displacement of each voxel in the soft tissue, so as to estimate the displacement of the target. The method can specifically include a deformation registration method based on gray scale and a deformation registration method combined with a biomechanical model.

The deformation registration based method is based on a three-dimension (three-dimension, 3D) image (denoted as V₀). During the motion estimation process, a two-dimension (two-dimension, 2D) projection image (denoted as I_(p)) of the target tissue is acquired quickly by using the X-ray imaging technology. With I_(p) as a reference, a new 3D image (denoted as V₁) is generated by deforming V₀, until the projection image I₁ of the V₁ is optimally matched with I_(p), and the corresponding V₁ contains the motion displacement information of the target. The “optimal matching” in the method is usually based on the measurement criterion of image gray scale, which is easily affected by gray scale changes and noise. The time taken by the optimization iteration process required for registration also typically limits the rapidity of motion estimation. The registration combined with the biomechanical model is to integrate the knowledge of morphology, material analysis, tissue elasticity of an anatomical structure and the like into the registration process, the boundary condition limitation of registration is added, the registration accuracy is improved, and it is difficult to accurately describe the biomechanical characteristics of the anatomical structure during this process.

SUMMARY

The technical problem to be solved by the present application is to overcome the shortcomings of the prior art and provide a system for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue. The system includes: an acquisition unit, a reference input unit, two surface extraction units, a target position extraction unit, a feature calculation unit, and a target motion estimation unit.

The acquisition unit is used for obtaining an actually captured image of the soft tissue; the reference input unit is used for inputting a reference image of the soft tissue; the surface extraction units are used for extracting soft tissue surfaces from the actually captured image and the reference image of the soft tissue; the target position extraction unit is used for extracting a reference position of the target from the reference image of the soft tissue; the feature calculation unit is used for calculating a deformation feature of the surface of the soft tissue in the actually captured image relative to the surface in the reference image; and the target motion estimation unit is used for calculating and outputting motion displacement estimation of the target based on the deformation feature and the reference position of the target.

Another object of the present application is to provide a method for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue, which is achieved through the following steps:

S1: a reference input unit uses medical imaging equipment to capture a soft tissue image as a reference image I_(ref).

In at least one embodiment of the present application, the “medical imaging equipment” includes: computed tomography (computed tomography, CT), cone-beam computed tomography (cone-beam computed tomography, CBCT), and ultrasonography.

In at least one embodiment of the present application, before the image is captured, one or more markers are implanted into a target area, and then the soft tissue image is captured by using the medical imaging equipment.

S2: a target position extraction unit identifies, calculates and outputs a reference position (denoted as t_(ref)) of the target from I_(ref).

S3: a surface extraction unit extracts a soft tissue surface (denoted f_(ref)) from I_(ref).

In at least one embodiment of the present application, the method of “extracting the soft tissue surface” includes: an automatic edge recognition algorithm based on adaptive threshold segmentation or a fully convolutional neural network model.

S4: an acquisition unit uses medical imaging equipment to capture an actually captured image (denoted as I_(i)) of the soft tissue.

In at least one embodiment of the present application, the “medical imaging equipment” includes: CT, CBCT, and ultrasonography.

S5: the surface extraction unit extracts a soft tissue surface (denoted as f_(i)from I_(i).

S6: a feature calculation unit calculates and outputs a deformation feature (denoted as Ψ_(i)) of f_(i) relative to f_(ref).

In at least one embodiment of the present application, the method of “calculating and outputting the deformation feature of f_(i) relative to f_(ref)” is: inputting f_(i) and f_(ref) into Ñ neural network models M={M_(j)|j=1, . . . , Ñ}, and the output results of the Ñ models together constitute the deformation feature Ψ_(i)={Ψ_(i,j)|j=1, . . . , Ñ}.

S7: Ψ_(i) and t_(ref)are input into a target motion estimation unit, and a target motion estimation model (m) in the unit calculates and outputs motion estimation (denoted as {circumflex over (t)}_(i)) of the target based on ‥_(i) and t_(ref).

The design method of the “Ñ neural network models M={M_(j)|j=1, . . . , Ñ}” in the “feature calculation unit” in the step S6 is as follows.

(1) Modeling: establishing a fully convolutional neural network model (denoted as FCN), wherein an input layer of FCN is 2 pieces of surface data, the hidden layers include {l₁, . . . l_(N−i)}, and an output layer is l_(N) . A deformation vector field (denoted as ϕ_(k)) of the 2 pieces of surface data is output by l_(N).

(2) Collecting training data: using medical imaging equipment to collect multiple groups of soft tissue images I_(k), extracting changing surfaces f_(k) (k=1, 2, . . . , n) of the soft tissue from I_(k), taking any one of the surfaces {f_(k)|k=1, . . . , n} as a reference surface (denoted as f_(ref)), and taking the rest surfaces as changing surfaces to form training sample pairs {(f_(k), f_(ref))|k=1, . . . ,n and k≈ref} together.

In at least one embodiment of the present application, the “medical imaging equipment” includes four-dimension (four-dimension, 4D) CT, 4D CBCT, and three-dimension ultrasonography. The soft tissue surface is directly delineated or identified by using automatic threshold segmentation or the neural network from the collected images.

(3) Training and optimizing FCN: inputting {(f_(k), f_(ref))|k=1, . . . , , n and k≈ref} into FCN, performing iterative optimization on the model by using unsupervised learning, setting a loss function as the difference between f_(k) and a generated surface. The generated surface (ϕ_(k)f_(ref)) is achieved by applying ϕ_(k) on f_(ref) and when the loss function is optimal, terminating the optimization.

In at least one embodiment of the present application, the index for measuring the difference between ϕ_(k)f_(ref) and f_(k) is the sum of minimum distances from all points in ϕ_(k)f_(ref) to f_(k).

(4) Constructing M={M_(j)|j=1, . . . ,Ñ}: in the layer structure {l₁, . . . l_(N)} of the trained FCN, and taking Ñ layers ({l_(k) _(j) |j=1,2, . . . , Ñ, k_(j)∈[1,N], Ñ≤N}) as the output layers of Ñ M_(j)s respectively, wherein the input layer is consistent with that of FCN, both of which are 2 pieces of surface data, and the hidden layers of each M_(j) is formed by sequentially ordering {l₁, . . . , l_(k) _(j) ⁻¹}.

If k_(j)=1, then M_(j) is only composed of an input layer and an output layer (i_(i)).

In at least one embodiment of the present application, the “{l_(k) _(j) |j=1,2, . . . , Ñ, k_(j) ∈[1, N], Ñ≤N}” are preferably convolutional layers in the trained FCN, this is because the convolutional layer is a feature extraction layer in the neural network model. However, those skilled in the art can think of without creative work that other layers in the FCN except the convolutional layer can be included in the constructed model M, so as to increase the number of neural network models (M) in the feature calculation unit that is constructed accordingly. Therefore, similar technical solutions do not exceed the protection scope of the present application.

The design method of the “target motion estimation model (m)” in the step S7 is as follows.

(1) Data collection: using medical imaging equipment to capture multiple groups of soft tissue images I_(p), and identifying and calculating a soft tissue surface f_(p) and a target position t_(p) from I_(p), wherein p=1, 2, . . . , n′.

In at least one embodiment of the present application, the “medical imaging equipment” includes 4D CT and three-dimension ultrasonography. The soft tissue images acquired by 4D CT are 10 groups of 3D CT scans which are uniformly sampled within one breathing cycle. Each 3D CT image contains a 3D image of the soft tissue surface and the target. Or, by implanting one or more markers into the target area of the soft tissue, surface changes of the soft tissue and the displacement of the marker are captured by using the 3D ultrasonography. The soft tissue surface and the target position are directly delineated or identified by using automatic threshold segmentation or the neural network from the collected images.

(2) Calculating a deformation feature Ψ_(p) to form training data: randomly taking a group from the collected {(f_(p), t_(p))|p=1,2, . . . , n′} as reference samples, marking the same as (f_(ref), t_(ref)), taking the rest as change samples {(f_(p), t_(p))|p=1,2, . . . n′ and p≠ref}, and inputting {(f_(p), f_(ref))|p=1, 2, . . . , n′ and p≈ref} into the feature calculation unit to generate the deformation feature Ψ_(p) . The deformation feature constitutes training data {(Ψ_(p), t_(p), t_(ref))|p=1,2, . . . , n′ and p≈ref} together with t_(p) and t_(ref).

(3) Fitting the target motion estimation model (m): the input data of the model is t_(ref) and Ψ_(p), the output is the displacement estimation (denoted as {circumflex over (t)}_(p)) of the target. Through iterative optimization, the difference between {tilde over (t)}_(p) output by the model and a true value t_(p) thereof is minimized.

In at least one embodiment of the present application, the “difference between {tilde over (t)}_(p) output by the model and the true value t_(p) thereof” is measured by the Euclidean distance between {tilde over (t)}_(p) and t_(p).

In at least one embodiment of the present application, the “target motion estimation model (m)” includes: a neural network model, a fully convolutional neural network model, a linear model, and a support vector machine model.

The present application provides a system and method for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue. The beneficial effects of the system and the method are: the displacement of the target is estimated depending on the deformation registration method in the prior art, the registration process is a process of obtaining an optimal solution through repeated calculations, and this process increases the time delay for estimating the displacement of the target; and on the other hand, the optimal solution for registration is often based on the gray scale similarity between a deformed image and a reference image, which determines that the accuracy of target displacement estimation in the prior art is affected by the gray difference, low contrast, noise and the like of the images. Different from the prior art, the solution provided by the present application is to fit the deformation feature with the displacement of the target, and does not include the process of repeated calculations, thus ensuring the timeliness of the estimation; and on the other hand, the deformation feature mentioned in the present application is only about the deformation of the soft tissue surface, unlike the prior art where the entire image is used as a registration object, the present application only focuses on the deformation of the surface profile, thus reducing the impact of the image quality on the accuracy. Therefore, the present application can guarantee the accuracy and the timeliness of target displacement estimation at the same time, and has a significant advantage compared with the prior art.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic diagram of a system for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue.

FIG. 2 is a flow diagram of a method for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue.

FIG. 3 is a schematic diagram of a design method of a plurality of neural network models in a feature calculation unit.

FIG. 4 is a schematic diagram of a method of a target motion estimation model.

DESCRIPTION OF EMBODIMENTS

The present application will be further explained in conjunction with the drawings and examples.

Example 1

A system for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue, as shown in FIG. 1 includes: an acquisition unit, a reference input unit, two surface extraction units, a target position extraction unit, a feature calculation unit, and a target motion estimation unit, wherein the acquisition unit is used for obtaining an actually captured image I_(i) of the soft tissue; the reference input unit is used for inputting a reference image I_(ref) of the soft tissue; the surface extraction units are used for extracting soft tissue surfaces from the actually captured image I_(i) and the reference image I_(ref) of the soft tissue, and the extracted surfaces are respectively expressed as f_(i) and f_(ref), the target position extraction unit is used for extracting a reference position t_(ref) of the target from the reference image I_(ref) of the soft tissue; the feature calculation unit is used for calculating a deformation feature (denoted as Ψ_(i)) of the surface f_(i) of the soft tissue in the actually captured image relative to the surface f_(ref)in the reference image; and the target motion estimation unit is used for calculating and outputting motion displacement estimation {circumflex over (t)}_(i) of the target based on the deformation feature Ψ_(i) and the reference position t_(ref) of the target.

Example 2

A method for estimating the motion of a target inside a tissue based on surface deformation of the soft tissue, as shown in FIG. 2, is achieved through the following steps:

S1: a reference input unit uses medical imaging equipment to capture a soft tissue image as a reference image I_(ref).

In at least one embodiment of the present application, the “medical imaging equipment” includes: CT, CBCT and ultrasonography.

In at least one embodiment of the present application, before the image is captured, one or more markers are implanted into a target area, and then the soft tissue image is captured by using the medical imaging equipment.

S2: a target position extraction unit identifies, calculates and outputs a reference position (denoted as t_(ref)) of the target from I_(ref).

S3: a surface extraction unit extracts a soft tissue surface (denoted as f_(ref)) from I_(ref).

In at least one embodiment of the present application, the method of “extracting the soft tissue surface” includes: an automatic edge recognition algorithm based on adaptive threshold segmentation or a fully convolutional neural network model.

S4: an acquisition unit uses medical imaging equipment to capture an actually captured image (denoted as of the soft tissue.

In at least one embodiment of the present application, the “medical imaging equipment” includes: CT, CBCT, and ultrasonography.

S5: the surface extraction unit extracts a soft tissue surface (denoted as f_(i)) from I_(i).

S6: a feature calculation unit calculates and outputs a deformation feature (denoted as Ψ_(i)) of f_(i) relative to f_(ref).

In at least one embodiment of the present application, the method of “calculating and outputting the deformation feature of f_(i) relative to f_(ref)” is: inputting f_(i) and f_(ref) into Ñ neural network models M={M_(j)|j=1, . . . , Ñ}, and the output results of the Ñ models together constitute the deformation feature Ψ_(i)={Ψ_(i,j)|j=1, . . . , Ñ}.

S7: Ψ_(i) and t_(ref) are input into a target motion estimation unit, and a target motion estimation model (m) in the unit calculates and outputs motion estimation (denoted as {circumflex over (t)}_(i)) of the target based on Ψ_(i) and t_(ref).

The design method of the “Ñ neural network models M={M_(j)|j=1, . . . , Ñ}” in the “feature calculation unit” in the step S6 is:

(1) Modeling: establishing a fully convolutional neural network model (denoted as FCN), wherein an input layer of FCN is 2 pieces of surface data, the hidden layers include {l₁, . . . l_(N−1)} and an output layer is l_(N). A deformation vector field ϕ_(k) of the 2 pieces of surface data is output by I_(N).

(2) Collecting training data: using medical imaging equipment to collect multiple groups of soft tissue images I_(k), extracting changing surfaces f_(k)(k=1, 2, . . . , n) of the soft tissue from I_(k), taking any one of the surfaces { f_(k)|k=1, . . . ,n} as a reference surface (denoted as f_(ref)), and taking the rest surfaces as changing surfaces to form training sample pairs {(f_(k), f_(ref))|k=1, . . . ,n and k≈ref} together.

In at least one embodiment of the present application, the “medical imaging equipment” includes 4D CT, 4D CBCT, and three-dimensional ultrasonography. The soft tissue surface is directly delineated or identified by using automatic threshold segmentation or the neural network from the collected images.

(3) Training and optimizing FCN: in the embodiment shown in FIG. 3, inputting {(f_(k), f_(ref))|k=1, . . . ,n and k≈ref} into FCN, performing iterative optimization on the model by using unsupervised learning, setting a loss function as the difference between f_(k) and a generated surface. The generated surface (ϕ_(k)f_(ref)) is achieved by applying ϕ_(k) on f_(ref), and when the loss function is optimal, terminating the optimization.

In at least one embodiment of the present application, the index for measuring the difference between ϕ_(k)f_(ref) and f_(k) is the sum of minimum distances from all points in ϕ_(k)f_(ref) to f_(k).

(4) Constructing M={M_(j)|j=1, . . . , Ñ}: in the embodiment shown in FIG. 3, Ñ=N, that is, all layers in the trained FCN construct N neural network models M={M_(j)|j=1, . . . , N} The input layer of each M_(j) is consistent with that of FCN, both of which are 2 pieces of surface data (f_(k), f_(ref)), the output layer is l_(j), and the hidden layers are formed by sequentially ordering {l₁, . . . , l_(j−1)}.

If j=1, then M₁ is only composed of an input layer and an output layer (l_(i)).

The design method of the “target motion estimation model (m)” in the step S7 is:

(1) data collection: using medical imaging equipment to capture multiple groups of soft tissue images I_(p), and identifying and calculating a soft tissue surface f_(p) and a target position t_(p) from I_(p), wherein p=1,2, . . . , n′.

In at least one embodiment of the present application, the “medical imaging equipment” includes 4D CT and three-dimensional ultrasonography. The soft tissue images acquired by 4D CT are 10 groups of 3D CT scans which are uniformly sampled within one breating cycle. Each 3D CT image contains a 3D image of the soft tissue surface and the target. Or, by implanting one or more markers into the target area of the soft tissue, surface changes of the soft tissue and the displacement of the markers are captured by using the 3D ultrasonography. The soft tissue surface and the target position are directly delineated or identified by using automatic threshold segmentation or the neural network from the collected images.

(2) Calculating a deformation feature Ψ_(p) to form training data: randomly taking a group from the collected {(f_(p), t_(p))|p=1,2, . . . n′} as reference samples, marking the same as (f_(ref), t_(ref)), taking the rest as change samples {(f_(p),t_(p))|p=1,2, . . . ,n′ and p≠ref}, and inputting {(f_(p),f_(ref))|p=1,2, . . . , n′ and p≠ref} into the feature calculation unit to generate the deformation feature Ψ_(p). The deformation feature constitutes training data {(Ψ_(p), t_(p), t_(ref))|p=1,2, . . . , n′ and p≠ref} together with t_(p) and t_(ref).

(3) Fitting the target motion estimation model (m): in the embodiment shown in FIG. 4, the input data of the model is t_(ref) and Ψ_(p), the output is the displacement estimation (denoted as {circumflex over (t)}_(p)) of the target. Through iterative optimization, the difference between {circumflex over (t)}_(p) output by the model and a true value t_(p) thereof is minimized.

In at least one embodiment of the present application, the “difference between {circumflex over (t)}_(p) output by the model and the true value t_(p) thereof” is measured by the Euclidean distance between {circumflex over (t)}_(p) and t_(p).

In at least one embodiment of the present application, the “target motion estimation model (m)” includes: a neural network model, a fully convolutional neural network model, a linear model, and a support vector machine model.

The core of the technical solution of the present application for realizing target displacement estimation is fitting, that is, correlating the target displacement with the deformation feature of the soft tissue surface using a mathematical expression. Since the speed at which the mathematical expression gives an estimated value is only related to the own calculation speed of a computer, the present application can ensure the timeliness of estimation. The source of the motion of the target in the soft tissue is the deformation of the soft tissue, and the most obvious deformation is a change on its surface, therefore, in the solution provided by the present application, the feature describing the surface deformation of the soft tissue is used as an association object, and the deformation feature is extracted by reconstructing the output of the hidden layers of the neural network, the function of the neural network is to realize the matching of the actually measured surface and the reference surface, and in order to avoid the influence of the gray scale of the image, the matched object adopts a surface contour extracted from the image. 

What is claimed is:
 1. A system for estimating the motion of a target inside a soft tissue based on surface deformation of the issue, wherein the system is composed of an acquisition unit, a reference input unit, two surface extraction units, a target position extraction unit, a feature calculation unit, and a target motion estimation unit.
 2. The system for estimating the motion of a target inside a soft tissue based on surface deformation of the issue according to claim 1, wherein the acquisition unit is used for obtaining an actually captured image of the soft tissue; the reference input unit is used for inputting a reference image of the soft tissue; the surface extraction units are used for extracting soft tissue surfaces from the actually captured image and the reference image of the soft tissue; the target position extraction unit is used for extracting a reference position of the target from the reference image of the soft tissue; the feature calculation unit is used for calculating a deformation feature of the surface of the soft tissue in the actually captured image relative to the surface in the reference image; and the target motion estimation unit is used for calculating and outputting a motion displacement estimation of the target based on the deformation feature and the reference position of the target.
 3. A method for estimating the motion of a target inside a soft tissue based on surface deformation of the tissue, wherein the method is achieved through the following steps: S1: a reference input unit capturing a soft tissue image as a reference image I_(ref) by using medical imaging equipment; S2: a target position extraction unit identifying, calculating and outputting a reference position (denoted as t_(ref)) of the target from I_(ref), S3: a surface extraction unit extracting a soft tissue surface (denoted as f_(ref)) from I_(ref), wherein a method of extracting the soft tissue surface comprises an automatic edge recognition algorithm by applying adaptive threshold segmentation or a fully convolutional neural network model; S4: an acquisition unit capturing an actually captured image (denoted as I_(i)) of the soft tissue by using the medical imaging equipment; S5: the surface extraction unit extracting a soft tissue surface (denoted as f_(i)) from I_(i); S6: a feature calculation unit calculating and outputting a deformation feature (denoted as Ψ_(i)) of f_(i) relative to f_(ref), a method of calculating and outputting the deformation feature of f_(i) relative to f_(ref) is: inputting f_(i) and f_(ref) into Ñ neural network models M={M_(j)|j=1, . . . , Ñ}, and output results of the Ñ models together constitute the deformation feature Ψ_(i)={Ψ_(i,j)|j=1, . . . , Ñ}; and S7: inputting Ψ_(i) and t_(ref) into a target motion estimation unit, and a target motion estimation model (m) in the unit calculating and outputting a motion estimation (denoted as {tilde over (t)}_(i)) of the target based on Ψ_(i) and t_(ref).
 4. The method according to claim 3, wherein the medical imaging equipment in the steps S1 and S4 comprises: computed tomography (CT), cone-beam CT (CBCT), and ultrasonography, and in the step S1, before the image is captured, one or more markers are implanted into a target area, and then the soft tissue image is captured by using the medical imaging equipment.
 5. The method according to claim 3, wherein a design method of the Ñ neural network models M={M_(j)|j=1, . . . , Ñ} in the feature calculation unit in the step S6 is: (1) modeling: establishing a fully convolutional neural network model (denoted as FCN), wherein an input layer of FCN is 2 pieces of surface data, hidden layers comprise {l₁, . . . , l_(N−1)}, and an output layer is l_(N); and a deformation vector field (denoted as ϕ_(k)) of the 2 pieces of surface data is output by l_(N); (2) collecting training data: using the medical imaging equipment to collect multiple groups of soft tissue images I_(k), extracting changing surfaces f_(k) (k=1, 2, . . . , n) of the soft tissue from I_(k), taking any one of the surfaces {f_(k)|k=1, . . . , n} as a reference surface (denoted as f_(ref)), and taking the rest surfaces as changing surfaces to form training sample pairs {(f_(k), f_(ref))|k=1, . . . ,n and k≠ref} together; (3) training and optimizing FCN: inputting {(f_(k), f_(ref))|k=1, . . . , n and k≠ref} into FCN, performing iterative optimization on the model by using unsupervised learning, setting a loss function as the difference between f_(k) and a generated surface. The generated surface (ϕ_(k)f_(ref)) is achieved by applying ϕ_(k) on f_(ref), and when the loss function is optimal, terminating the optimization; wherein an index for measuring the difference between ϕ_(k)f_(ref) and f_(k) is a sum of minimum distances from all points in ϕ_(k) f_(ref) to f_(k); (4) constructing M={M_(j)|j=1, . . . , Ñ}: in the layer structure {l₁, . . . , l_(N)} of the trained FCN, and taking Ñ layers ({I_(k) _(j) |j=1,2, . . . , Ñ, k_(j) ∈[1, N], Ñ≤N}) as the output layers of Ñ M_(j)s, respectively, wherein the input layer is consistent with that of FCN, both of which are two pieces of surface data, and the hidden layers of each M_(j) is formed by sequentially ordering {l₁, . . . , l_(k) _(j) −1}.
 6. The method according to claim 5, wherein the medical imaging equipment in the step (2) comprises four-dimension (4D) CT, 4D CBCT, and three-dimensional ultrasonography, and the soft tissue surface is directly delineated or identified by using automatic threshold segmentation or the neural network from the collected images.
 7. The method according to claim 5, wherein in the step (4), if k_(j)=1, then M_(j) is only composed of an input layer and an output layer (l₁), and the {l_(k) _(j) |j=1,2, . . . , Ñ, k_(j) ∈[1,N], Ñ≤N} is preferably a convolutional layer in the trained FCN.
 8. The method according to claim 3, wherein a design method of the target motion estimation model (m) in the step S7 is: (1) data collection: using medical imaging equipment to capture multiple groups of soft tissue images I_(p), and identifying and calculating a soft tissue surface f_(p) and a target position t_(p) from I_(p), where p=1, 2, . . . ,n′; (2) calculating a deformation feature Ψ_(p)to form training data: randomly taking a group from the collected {(f_(p), t_(p))|p=1,2, . . . , n′} to serve as reference samples denoted as (f_(ref), t_(ref)), taking the rest as change samples {(f_(p),t_(p))|p=1,2, . . . ,n′ and p≠ref}, and inputting {(f_(p), f_(ref))|p=1,2, . . . , n′ and p≈ref} into the feature calculation unit to generate the deformation feature gi_(p) , wherein the deformation feature constitutes training data {(Ψ_(p), t_(p),t_(ref))|p=1,2, . . . ,n′ and p≠ref} together with t_(p) and t_(ref), and (3) fitting the target motion estimation model (m): the input data of the model is t_(ref) and Ψ_(p), the output is the displacement estimation (denoted as {circumflex over (t)}_(p)) of the target; through iterative optimization, a difference between {circumflex over (t)}_(p) output by the model and a true value t_(p) thereof is minimized, the difference between {circumflex over (t)}_(p) output by the model and the true value t_(p) thereof is measured by an Euclidean distance between {circumflex over (t)}_(p) and t_(p).
 9. The method according to claim 8, wherein the medical imaging equipment in the step (1) comprises 4D CT and three-dimensional ultrasonography, the soft tissue images acquired by 4D CT are 10 groups of 3D CT scans which are uniformly sampled within one breathing cycle; each 3D CT image contains a 3D image of the soft tissue surface and the target, or, by implanting one or more markers into the target area of the soft tissue, surface changes of the soft tissue and the displacement of the markers are captured by using the 3D ultrasonography, and the soft tissue surface and the target position are directly delineated, or identified by using automatic threshold segmentation or the neural network from the collected images.
 10. The method according to claim 8, wherein the target motion estimation model (m) in the step (3) comprises: a neural network model, a fully convolutional neural network model, a linear model, and a support vector machine model. 